Goto

Collaborating Authors

 parallel composition


How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy

Ponomareva, Natalia, Xu, Zheng, McMahan, H. Brendan, Kairouz, Peter, Rosenblatt, Lucas, Cohen-Addad, Vincent, Guzmán, Cristóbal, McKenna, Ryan, Andrew, Galen, Bie, Alex, Yu, Da, Kurakin, Alex, Zadimoghaddam, Morteza, Vassilvitskii, Sergei, Terzis, Andreas

arXiv.org Machine Learning

High quality data is needed to unlock the full potential of AI for end users. However finding new sources of such data is getting harder: most publicly-available human generated data will soon have been used. Additionally, publicly available data often is not representative of users of a particular system -- for example, a research speech dataset of contractors interacting with an AI assistant will likely be more homogeneous, well articulated and self-censored than real world commands that end users will issue. Therefore unlocking high-quality data grounded in real user interactions is of vital interest. However, the direct use of user data comes with significant privacy risks. Differential Privacy (DP) is a well established framework for reasoning about and limiting information leakage, and is a gold standard for protecting user privacy. The focus of this work, \emph{Differentially Private Synthetic data}, refers to synthetic data that preserves the overall trends of source data,, while providing strong privacy guarantees to individuals that contributed to the source dataset. DP synthetic data can unlock the value of datasets that have previously been inaccessible due to privacy concerns and can replace the use of sensitive datasets that previously have only had rudimentary protections like ad-hoc rule-based anonymization. In this paper we explore the full suite of techniques surrounding DP synthetic data, the types of privacy protections they offer and the state-of-the-art for various modalities (image, tabular, text and decentralized). We outline all the components needed in a system that generates DP synthetic data, from sensitive data handling and preparation, to tracking the use and empirical privacy testing. We hope that work will result in increased adoption of DP synthetic data, spur additional research and increase trust in DP synthetic data approaches.


Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information

Corazza, Jan, Aria, Hadi Partovi, Kim, Hyohun, Neider, Daniel, Xu, Zhe

arXiv.org Artificial Intelligence

Reinforcement learning (RL) algorithms can find an optimal policy for a single agent to accomplish a particular task. However, many real-world problems require multiple agents to collaborate in order to achieve a common goal. For example, a robot executing a task in a warehouse may require the assistance of a drone to retrieve items from high shelves. In Decentralized Multi-Agent RL (DMARL), agents learn independently and then combine their policies at execution time, but often must satisfy constraints on compatibility of local policies to ensure that they can achieve the global task when combined. In this paper, we study how providing high-level symbolic knowledge to agents can help address unique challenges of this setting, such as privacy constraints, communication limitations, and performance concerns. In particular, we extend the formal tools used to check the compatibility of local policies with the team task, making decentralized training with theoretical guarantees usable in more scenarios. Furthermore, we empirically demonstrate that symbolic knowledge about the temporal evolution of events in the environment can significantly expedite the learning process in DMARL.


A Modular Framework for Motion Planning using Safe-by-Design Motion Primitives

Vukosavljev, Marijan, Kroeze, Zachary, Schoellig, Angela P., Broucke, Mireille E.

arXiv.org Artificial Intelligence

We present a modular framework for solving a motion planning problem among a group of robots. The proposed framework utilizes a finite set of low level motion primitives to generate motions in a gridded workspace. The constraints on allowable sequences of motion primitives are formalized through a maneuver automaton. At the high level, a control policy determines which motion primitive is executed in each box of the gridded workspace. We state general conditions on motion primitives to obtain provably correct behavior so that a library of safe-by-design motion primitives can be designed. The overall framework yields a highly robust design by utilizing feedback strategies at both the low and high levels. We provide specific designs for motion primitives and control policies suitable for multi-robot motion planning; the modularity of our approach enables one to independently customize the designs of each of these components. Our approach is experimentally validated on a group of quadrocopters.


Compositional Active Learning of Synchronizing Systems through Automated Alphabet Refinement

Henry, Leo, Neele, Thomas, Mousavi, Mohammad Reza, Sammartino, Matteo

arXiv.org Artificial Intelligence

Active automata learning infers automaton models of systems from behavioral observations, a technique successfully applied to a wide range of domains. Compositional approaches for concurrent systems have recently emerged. We take a significant step beyond available results, including those by the authors, and develop a general technique for compositional learning of a synchronizing parallel system with an unknown decomposition. Our approach automatically refines the global alphabet into component alphabets while learning the component models. We develop a theoretical treatment of distributions of alphabets, i.e., sets of possibly overlapping component alphabets. We characterize counter-examples that reveal inconsistencies with global observations, and show how to systematically update the distribution to restore consistency. We present a compositional learning algorithm implementing these ideas, where learning counterexamples precisely correspond to distribution counterexamples under well-defined conditions. We provide an implementation, called CoalA, using the state-of-the-art active learning library LearnLib. Our experiments show that in more than 630 subject systems, CoalA delivers orders of magnitude improvements (up to five orders) in membership queries and in systems with significant concurrency, it also achieves better scalability in the number of equivalence queries.


The $\mu\mathcal{G}$ Language for Programming Graph Neural Networks

Belenchia, Matteo, Corradini, Flavio, Quadrini, Michela, Loreti, Michele

arXiv.org Artificial Intelligence

Graph neural networks form a class of deep learning architectures specifically designed to work with graph-structured data. As such, they share the inherent limitations and problems of deep learning, especially regarding the issues of explainability and trustworthiness. We propose $\mu\mathcal{G}$, an original domain-specific language for the specification of graph neural networks that aims to overcome these issues. The language's syntax is introduced, and its meaning is rigorously defined by a denotational semantics. An equivalent characterization in the form of an operational semantics is also provided and, together with a type system, is used to prove the type soundness of $\mu\mathcal{G}$. We show how $\mu\mathcal{G}$ programs can be represented in a more user-friendly graphical visualization, and provide examples of its generality by showing how it can be used to define some of the most popular graph neural network models, or to develop any custom graph processing application.


Mixed Nondeterministic-Probabilistic Automata: Blending graphical probabilistic models with nondeterminism

Benveniste, Albert, Raclet, Jean-Baptiste

arXiv.org Artificial Intelligence

Graphical models in probability and statistics are a core concept in the area of probabilistic reasoning and probabilistic programming-graphical models include Bayesian networks and factor graphs. In this paper we develop a new model of mixed (nondeterministic/probabilistic) automata that subsumes both nondeterministic automata and graphical probabilistic models. Mixed Automata are equipped with parallel composition, simulation relation, and support message passing algorithms inherited from graphical probabilistic models. Segala's Probabilistic Automatacan be mapped to Mixed Automata.


Reward Machines for Cooperative Multi-Agent Reinforcement Learning

Neary, Cyrus, Xu, Zhe, Wu, Bo, Topcu, Ufuk

arXiv.org Artificial Intelligence

In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal. We propose the use of reward machines (RM) -- Mealy machines used as structured representations of reward functions -- to encode the team's task. The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies and independencies, allowing the team-level task to be decomposed into sub-tasks for individual agents. We define such a notion of RM decomposition and present algorithmically verifiable conditions guaranteeing that distributed completion of the sub-tasks leads to team behavior accomplishing the original task. This framework for task decomposition provides a natural approach to decentralized learning: agents may learn to accomplish their sub-tasks while observing only their local state and abstracted representations of their teammates. We accordingly propose a decentralized q-learning algorithm. Furthermore, in the case of undiscounted rewards, we use local value functions to derive lower and upper bounds for the global value function corresponding to the team task. Experimental results in three discrete settings exemplify the effectiveness of the proposed RM decomposition approach, which converges to a successful team policy two orders of magnitude faster than a centralized learner and significantly outperforms hierarchical and independent q-learning approaches.


Privacy-Preserving Gradient Boosting Decision Trees

Li, Qinbin, Wu, Zhaomin, Wen, Zeyi, He, Bingsheng

arXiv.org Machine Learning

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. \textit{Sensitivity} and \textit{privacy budget} are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines.


Risk Structures: Towards Engineering Risk-aware Autonomous Systems

Gleirscher, Mario

arXiv.org Artificial Intelligence

Inspired by widely-used techniques of causal modelling in risk, failure, and accident analysis, this work discusses a compositional framework for risk modelling. Risk models capture fragments of the space of risky events likely to occur when operating a machine in a given environment. Moreover, one can build such models into machines such as autonomous robots, to equip them with the ability of risk-aware perception, monitoring, decision making, and control. With the notion of a risk factor as the modelling primitive, the framework provides several means to construct and shape risk models. Relational and algebraic properties are investigated and proofs support the validity and consistency of these properties over the corresponding models. Several examples throughout the discussion illustrate the applicability of the concepts. Overall, this work focuses on the qualitative treatment of risk with the outlook of transferring these results to probabilistic refinements of the discussed framework.


Programming by Example Using Least General Generalizations

Raza, Mohammad (Microsoft Research) | Gulwani, Sumit (Microsoft Research) | Milic-Frayling, Natasa (Microsoft Research)

AAAI Conferences

Recent advances in Programming by Example (PBE) have supported new applications to text editing, but existing approaches are limited to simple text strings. In this paper we address transformations in richly formatted documents, using an approach based on the idea of least general generalizations from inductive inference, which avoids the scalability issues faced by state-of-the-art PBE methods. We describe a novel domain specific language (DSL) that expresses transformations over XML structures describing richly formatted content, and a synthesis algorithm that generates a minimal program with respect to a natural subsumption ordering in our DSL. We present experimental results on tasks collected from online help forums, showing an average of 4.17 examples required for task completion.